Mapping of Sequence Reads to the Reference Genomes ◾ 61
string, $, which is also the first character of F column. The character before $ will be the
first character of L column. In general, the character which comes before can be inferred
with the pairs L and F relationships as shown in Figure 2.10. These relationships are also
illustrated in Figure 2.11 which shows that, first, we recover the first two columns of the
BWM by creating a two-column matrix from columns L and F and sorting it as shown
in Figure 2.11a. The third column is recovered by creating a three-column matrix from
column L and the first two recovered columns and then we sort it (Figure 2.11b). The same
is repeated to recover the fourth column of the sorted rotations as shown in Figure 2.11c.
We will continue like this until we recover all columns and the BWM will be as shown
in Figure 2.11d. The original string is the one ends with the character “$” in the BWM as
indicated in the shaded row in Figure 2.11.
The steps shown in Figures 2.10 and 2.11 are to show how a BWT is reversible. However,
there are several algorithms implemented by software for reversing a BWT using LF
mapping. The BWT is used as data structure for indexing a genome (compressed or
uncompressed) by transforming the entire genome into a BWT. Once a genome has been
transformed, there are several searching algorithms and approaches for finding the posi-
tion of a substring (read) on the reference genome.
FIGURE 2.10 LF mapping to reverse the BWT.
FIGURE 2.11 Using LF mapping to reverse the sorted rotations.